SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints
نویسندگان
چکیده
Splicing event identification is one of the most important issues in the comprehensive analysis of transcription profile. Recent development of next-generation sequencing technology has generated an extensive profile of alternative splicing. However, while many of these splicing events are between exons that are relatively close on genome sequences, reads generated by RNA-Seq are not limited to alternative splicing between close exons but occur in virtually all splicing events. In this work, a novel method, SAW, was proposed for the identification of all splicing events based on short reads from RNA-Seq. It was observed that short reads not in known gene models are actually absent words from known gene sequences. An efficient method to filter and cluster these short reads by fingerprint fragments of splicing events without aligning short reads to genome sequences was developed. Additionally, the possible splicing sites were also determined without alignment against genome sequences. A consensus sequence was then generated for each short read cluster, which was then aligned to the genome sequences. Results demonstrated that this method could identify more than 90% of the known splicing events with a very low false discovery rate, as well as accurately identify, a number of novel splicing events between distant exons.
منابع مشابه
MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data
Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (multivariate analysis of transcript splicing), a bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon sp...
متن کاملARH-seq: identification of differential splicing in RNA-seq data
The computational prediction of alternative splicing from high-throughput sequencing data is inherently difficult and necessitates robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression differences and simultaneous expression of multiple isoforms amongst others. In this work we describe ARH-seq, a discovery tool for differ...
متن کاملPVAAS: identify variants associated with aberrant splicing from RNA-seq
MOTIVATION RNA-seq has been widely used to study the transcriptome. Comparing to microarray, sequencing-based RNA-seq is able to identify splicing variants and single nucleotide variants in one experiment simultaneously. This provides unique opportunity to detect variants that associated with aberrant splicing. Despite the popularity of RNA-seq, no bioinformatics tool has been developed to leve...
متن کاملSplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data
MOTIVATION Understanding the occurrence and regulation of alternative splicing (AS) is a key task towards explaining the regulatory processes that shape the complex transcriptomes of higher eukaryotes. With the advent of high-throughput sequencing of RNA (RNA-Seq), the diversity of AS transcripts could be measured at an unprecedented depth. Although the catalog of known AS events has grown ever...
متن کاملLong Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2010